description: |
This dataset provides a simple example of what survival and censoring. It provides an inuitive explanation of estimation of survival probabilities.
description: The data represents survival times for a 500 patient subset of data from
the Worcester Heart Attack Study. You can find more information about this data
set in Chapter 1 of Hosmer, Lemeshow, and May.
Worcester Heart Attack Study, 2
id:
label: a sequential code from 1 to 100
age:
scale: ratio
label: Age at Admission
unit: years
gender:
scale: binary
value:
Male: 0
Female: 1
Worcester Heart Attack Study, 3
hr:
scale: ratio
label: Initial Heart Rate
unit: Beats per minute
sysbp:
scale: ratio
label: Initial Systolic Blood Pressure
unit: mmHg
diasbp:
scale: ratio
label: Initial Diastolic Blood Pressure
unit: mmHg
Worcester Heart Attack Study, 4
bmi:
scale: ratio
label: Body Mass Index
unit: kg/m^2
cvd:
scale: binary
label: History of Cardiovascular Disease
value:
'FALSE': 0
'TRUE': 1
miord:
scale: binary
label: MI Order
value:
First: 0
Recurrent: 1
mitype:
scale: binary
label: MI Type
value:
non Q-wave: 0
Q-wave: 1
Worcester Heart Attack Study, 8
year:
scale: ordinal
label: Cohort Year
value:
yr1997: 1
yr1999: 2
yr2001: 3
admitdate:
label: Admission Date
format: mm/dd/yyyy
disdate:
label: Hospital Discharge Date
format: mm/dd/yyyy
Worcester Heart Attack Study, 9
fdate:
label: Date of last Follow Up
format: mm/dd/yyyy
los:
scale: ratio
label: Length of Hospital Stay
unit: Days
dstat:
scale: binary
label: Discharge Status from Hospital
value:
Alive: 0
Dead: 1
Worcester Heart Attack Study, 9
lenfol:
scale: ratio
label: Follow Up Time
unit: days
fstat:
scale: binary
label: Vital Satus
value:
Alive: 0
Dead: 1
Event count
# A tibble: 2 × 2
fstat n
<dbl> <int>
1 0 285
2 1 215
Overall Kaplan-Meier curve
Live demo, Overall Kaplan-Meier curve
Break #2
What you have learned
Overall Kaplan-Meier curve
What’s coming next
The log rank test
Cox regression for gender
Call:
coxph(formula = Surv(lenfol, fstat) ~ gender, data = whas_4)
coef exp(coef) se(coef) z p
gender 0.3417 1.4074 0.1526 2.24 0.0251
Likelihood ratio test=4.95 on 1 df, p=0.02606
n= 461, number of events= 176
Call:
coxph(formula = Surv(lenfol, fstat) ~ gender, data = whas_4)
coef exp(coef) se(coef) z p
gender 0.3417 1.4074 0.1526 2.24 0.0251
Likelihood ratio test=4.95 on 1 df, p=0.02606
n= 461, number of events= 176
Live demo, The log rank test
Break #3
What you have learned
The log rank test
What’s coming next
The hazard function
Life insurance example
Probabilities for ages 21 through 41
Probabilities for ages 95 through 99
Why are these probabilities not comparable?
Unequal time intervals
Fix by computing a rate
Non-uniform probabilities over the interval
Fix by looking at narrow interval
No adjustment for survivorship
Fix by dividing by survival probabilty
Hazard function, definition
\[h(t)=lim_{\Delta t \rightarrow 0}\frac{P[t \le T \le t+\Delta t]/\Delta t}{P[T \ge t]}\]
\[h(t)=\frac{f(t)}{S(t)}\]
where \(f\) is the density function, and
\(S\) is the survival function (\(S(t)=1-F(t)\))
Hazard function, example
Hazard function on a log scale
Break #4
What you have learned
The hazard function
What’s coming next
The Cox regression model
Mean ages for men and women
Unadjusted and adjusted Cox regression models for gender
Live demo, The Cox regression model
Break #5
What you have learned
The Cox regression model
What’s coming next
Assumptions and data management
Assumptions of the log rank test
Independence
From one patient to another
Of censoring mechanism
Assumptions of the Cox regression model
Independence
Proportional hazards assumption
Possible violations of proportional hazards
Survival curves that cross
One curve flattening out over time
Curves diverge only at later times
Survival curves that cross
One curve flattening out over time
Curves diverge only at later times
Sample size issues
Rule of 50
Rule of 15
Use ISO format for dates
Understand the internal storage system for dates
Date management
The three dates you need
the date of origin,
the date of the event (if it occurred),
the date of last contact with the patient.
The date of origin
Rehospitalization
use date of first discharge.
Failure of a mechanical device
use date of implant.
Divorce
use date of marriage.
Loan default
use date of loan contract.
Infectious disease
use date of first exposure.
The date of the event
Define your event precisely
All-cause mortality
Mortality related to the health condition
Composite endpoints (e.g., death or relapse)
Requires comparing the earlier of two dates.
If the event did NOT happen, leave this field blank/missing.
The date of last contact
If event did not occur
Must be specified
Typically last medical exam or last telephone contact
If event did occur
Make same as event date, or
Leave blank
Survival calculations, 1 of 2
Time = max(Date of event, Date of last contact) - Date of origin
Censoring variable = 0 if Date of event is missing, 1 if not